Crowd-sourced data coding for the social sciences: massive non-expert human coding of political texts
نویسندگان
چکیده
A large part of empirical social science relies heavily on data that are not observed in the field, but are generated by researchers sitting at their desks. Clearly, third party users of such coded data must satisfy themselves in relation to both reliability and validity. This paper discusses some of these matters for a widely used type of coded data, derived from content analysis of political texts. Comparing multiple “expert” and crowd-sourced codings of the same texts, as well as with independent estimates of the same latent quantities, we assess the extent to which we can estimate these quantities reliably using the cheap and scalable method of crowd sourcing. Our results show that, contrary to naive preconceptions and reflecting concerns that are often swept under the carpet, a set of expert coders is also a crowd. We find that deploying a crowd of non-expert coders on the same texts raises issues relating to coding quality that need careful consideration. If these issues can be resolved by careful specification and design, crowdsourcing offers the prospect of cheap, scalable and replicable text coding. While these results concern text coding, we see no reason why they do not extend to other forms of expert coded data in the social sciences. Paper prepared for presentation at the 70 Annual Conference of the Midwest Political Science Association. Palmer House Hotel, Chicago. 12-15 April 2012. We thank Joseph Childress at CrowdFlower for assisting with the setup of the crowd-sourcing platform. This research was supported financially by the European Research Council grant ERC-2011-StG 283794-QUANTESS. Crowd-sourced data coding for the social sciences / 1
منابع مشابه
Crowd-sourced data coding for the social sciences: massive non-expert coding of political texts
A large part of empirical social science relies heavily on data that are not observed in the field, but are generated by researchers sitting at their desks, raising obvious issues of both reliability and validity. This paper addresses these issues for a widely used type of coded data, derived from the content analysis of political text. Comparing estimates derived from multiple “expert” and cro...
متن کاملCrowd-sourced Text Analysis: Reproducible and Agile Production of Political Data
Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to dis...
متن کاملContent Analysis by the Crowd: Assessing the Usability of Crowdsourcing for Coding Latent Constructs
Crowdsourcing platforms are commonly used for research in the humanities, social sciences and informatics, including the use of crowdworkers to annotate textual material or visuals. Utilizing two empirical studies, this article systematically assesses the potential of crowdcoding for less manifest contents of news texts, here focusing on political actor evaluations. Specifically, Study 1 compar...
متن کاملProposing a model for entrepreneurship opportunities and challenges in online social networks in Iran
Human life has been affected by new communications in recent years. The development of cyberspace has led to businesses embracing social networks. The appearance of the cyberspace and features offered by information technology (IT) has provided hope, wishes, opportunities, and challenges for business owners. Organizations create opportunities for improving productivity, market share and value, ...
متن کاملInduction of apoptosis and necrosis in human acute erythroleukemia cells by inhibition of long non-coding RNA PVT1
Recent advances in molecular medicine have proposed new therapeutic strategies for cancer. One of the molecular research lines for the diagnosis and treatment of cancer is the use of long non-coding RNAs (LncRNAs) which are a class of non-coding RNA molecules longer than 200 base pairs in length that act as the key regulator of gene expression. Different aspects of cellular activities like cell...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012